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1. Introduction 



An isomorphism between two graphs is a bijection between their vertex sets that pre- 
serves adjacency. An automorphism is an isomorphism from a graph to itself. The set of 
all automorphisms of a graph G form a group under composition called the automorphism 
group Aut(G). 

The graph isomorphism problem (GI) is that of determining whether there is an 
isomorphism between two given graphs. GI has long been a favorite target of algorithm 
designers — so much so that it was already described as a "disease" in 1976 (Read and 



Corneil 1977) 



Though it is not the focus of this paper, we summarize the current state of the the- 
oretical study of graph isomorphism. It is obvious that GI £ NP but unknown whether 
GI G co-NP. As that implies, no polynomial time algorithm is known (despite many 
published claims), but neither is GI known to be NP-complete. NP-completeness is con- 
sidered unlikely since it would imply collapse of the polynomial-time hierarchy ( jGoldre 



ich et al. 1991[). The fastest pr oven running time for GI has stood for three decades at 



7 0(y/nlog n) 

On the other hand, polynomial time algorithms are known for many special classes of 



(Babai et al. 



1983) 



graphs. The most general such classes are those with a forbidden minor (Ponomarenko 



1988 Grohe 2010) and those with a forbidden topological minor (Grohe 



classes include many earlier classes such as graphs of bounded degree (Luks 19821 



bounded genus (Filotti and Mayer 1980 Miller 19801 and bounded tree- width (Bod 



2012). These 



laender 1990 ) . The algorithms resulting from this theory are most unlikely to be useful 



in practice. Only for a very few important graph classes, such as trees (Aho et al. 1974) 



and planar graphs (Colbourn and Booth 1981) are there practical approaches which arc 



sure to outperform general methods such as described in this paper. 

Testing two graphs for isomorphism directly can have the advantage that an isomor- 
phism might be found long before an exhaustive search is complete. On the other hand, 
it is poorly suited for the common problems of rejecting isomorphs from a collection of 
graphs or identifying a graph in a database of graphs. For this reason, the most common 
practical approach is "canonical labelling" , a process in which a graph is relabeled in such 
a way that isomorphic graphs are identical after relabelling. When we have an efficient 
canonical labelling procedure, we can use a sorting algorithm for removing isomorphs 
from a large collection and standard data structures for database retrieval. 

It is impossible to comprehensively survey the history of this problem since there are 
at least a few hundred published algorithms. However, a clear truth of history is that 
the most successful approach has involved fixing of vertices together with refinement of 
partitions of the vertex set. This "individualization-refinement" paradigm was introduced 
by Parris and Read ( 1969 1 and developed by Corneil and Gotlieb| |l970| and Arlazarov et 



al. 



(1974). However, the first program that could handle both structurally regular graphs 



with hundreds of vertices and graphs with large automorphism groups was that of McKay 
(1978b, 1980| ), that later became known as nauty. The main advantage of nauty over 
earlier programs was its innovative use of automorphisms to prune the search. Although 



there were some worthy competitors (Leon 
field for the next several decades. 



1990 Kocay 1996), nauty dominated the 



This situation changed when Darga et al. (2004) introduced saucy, which at that stage 
was essentially a reimplementation of the automorphism group subset of nauty using 



2 



sparse data structures. This gave it a very large advantage for many graphs of practical 
interest, prompting the first author to release a version of nauty for sparse graphs. Saucy 
has since introduced some important innovations, such as the ability to detect some types 



of automorphism (such as those implied by a locally tree-like structure) very early ( Darga 
et al. 2008). Soon afterwards iJuntilla and Kaski (2007 2011) introduced Bliss, which 



also used the same algorithm but had some extra ideas that helped its performance on 
difficult graphs. In particular, it allowed refinement operations to be aborted early in 
some cases. The latter idea reached its full expression in Traces, which we introduce in 
this paper. More importantly, Traces pioneered a major revision of the way the search 
tree is scanned, which we will demonstrate to produce great efficiency gains. 

Another program worthy of consideration is conauto (Lopez-Presa and Fernandez 



Anta 2009 Lopez-Presa et al. 2011). It does not feature canonically labelling, though 



it can compare two graphs for isomorphism. 

In Section 2, we provide a description of algorithms based on the individualization- 
rcfinement paradigm. It is sufficiently general to encompass the primary structure of all 
of the most successful algorithms. In Section 3, we flesh out the details of how nauty and 
Traces are implemented, with emphasis on how they differ from differ. In Section 4, we 
compare the performance of nauty and Traces with Bliss, saucy and conauto when 
applied to a variety of families of graphs ranging from those traditionally easy to the 
most difficult known. Although none of the programs is the fastest in all cases, we will 
see that nauty is generally the fastest for small graphs and some easier families, while 
Traces is better, sometimes in dramatic fashion, for most of the difficult graph families. 



2. Generic Algorithm 

In this section, we give formal definitions of colourings (partitions), invariants, and 
group actions. We then define the search tree which is at the heart of most recent graph 
isomorphism algorithms and explain how it enables computation of automorphism groups 
and canonical forms. This section is intended to be a self-contained introduction to the 
overall strategy and does not contain new features. 

Let G = Q n denote the set of graphs with vertex set V = {1, 2, ... , n}. 

2.1. Colourings 

A colouring of V (or of G G Q) is a surjective function tt from V onto {1, 2, . . . , k} for 
some k. The number of colours, i.e. k, is denoted by \ir\. A cell of tt is the set of vertices 
with some given colour; that is, tt _1 (j) for some j with 1 < j < |tt|. A discrete colouring 
is a colouring in which each cell is a singleton, in which case \tt\ — n. Note that a discrete 
colouring is a permutation of V. 

If tt,tt' are colourings, then tt' is finer than or equal to tt, written tt' ^ tt, if tt(v) < 
tt(w) => tt'(v) < tt'(w) for all v, w € V. (This implies that each cell of tt' is a subset of a 
cell of tt, but the converse is not true.) 

Since a colouring partitions V into cells, it is frequently called a partition. However, 
note that the colours come in a particular order and this matters when defining concepts 
like "finer". 

A pair {G, tt), where tt is a colouring of G, is called a coloured graph. 
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2.2. Group actions and isomorphisms 



Let S n denote the symmetric group acting on V. We indicate the action of elements 
of S n by exponentiation. That is, for v G V and g G S n , v 9 is the image of v under g. 
The same notation indicates the induced action on complex structures derived from V; 
in particular: 

(a) If W C V, then W 9 — {w 9 : w G W}, and similarly for sequences. 

(b) If G G Q, then G 9 G Q has v 9 adjacent to w 9 exactly when v and w are adjacent 
in G. As a special discrete colouring ir is a permutation on F so we can 
write G" 1 . 

(c) If 7T is a colouring of V, then ir 9 is the colouring with ir 9 (v) = ir(v 9 ) for each v GV. 

(d) If (G,tt) is a coloured graph, then (G, ir) 9 = (G 9 ,ir 9 ). 

Two coloured graphs (G, 7r), (G', it') are isomorphic if there is j e S„ such that 
(G', 7r') = (G, tt) 3 , in which case we write (G, 7r) = (G', 7r'). Such a g is called an isomor- 
phism. The automorphism group Aut(G, ir) is the group of isomorphisms of the coloured 
graph (G, ir) to itself; that is, 

Aut(G,7r) = {g G S n : (G,ir) 9 = (G,tt)}. 

A canonical form is a function 

G : £ x 77^ £ x 77 

such that, for all G € (?, 7r e 77 and g e S„, 
(CI) G(G, 7r) = (G, ir), 
(C2) G(G 9 , 7r 9 ) = C(G,7r). 

In other words, it assigns to each coloured graph an isomorphic coloured graph that 
is a unique representative of its isomorphism class. It follows from the definition that 
(G,tt) S (G',tt') ^ G(G,tt) = G(G',tt'). 

Property (C2) is an important property that must be satisfied by many functions 
we define. It says that if the elements of V appearing in the inputs to the function are 
renamed in some manner, the elements of V appearing in the function value are renamed 
in the same manner. We call this label-invariance. 

2.3. Search tree 

Now we define a rooted tree whose nodes correspond to sequences of vertices, with the 
empty sequence at the root of the tree. The sequences become longer as we move down 
the tree. Each sequence corresponds to a colouring of the graph obtained by giving the 
vertices in the sequence unique colours then inferring in a controlled fashion a colouring 
of the other vertices. Leaves of the tree correspond to sequences for which the derived 
colouring is discrete. 

To formally define the tree, we first define a "refinement function" that specifies the 
colouring that corresponds to a sequence. Let V* denote the set of finite sequences of ver- 
tices. For v G V* , \ v\ denotes the number of components of v. If v = (v\, . . . , v^) G V* and 
w G V, then v || w denotes (vi, . . . , Vk,w). Furthermore, for < s < k, [v] s — (vi, . . . , v s ). 
The ordering < on finite sequences is the lexicographic order: If v = (vi,...,Vk) and 
v' = (v[, . . . , v' e ), then v < v' if v is a prefix of v' or there is some j < minjfc, £} such 
that Vi — v\ for i < j and Vj < v'j. 
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A refinement function is a function 

R-.gxIIxV*^n 

such that for any G £ Q, n £ H and v £ V*, 
(Rl) R{G,n,v) r< tt; 

(R2) if u e z/, then {w} is a cell of i?(G, n, v); 

(R3) for any g e S„, we have i?(G 9 , tt 9 , = R(G, n, 

To complete the definition of the tree, we need to specify what are the children of each 
node. We do this by choosing one non-singleton cell of the colouring, called the target 
cell, and appending an element of it to the sequence. 

A target cell selector chooses a non-singleton cell of a colouring, if there is one. For- 
mally, it is a function 

T : Q x n x V* -> 2 V 

such that for any no £ II, G £ Q and v £ V* , 

(Tl) if R(G, n , v) is discrete, then T(G, tt , v) = 0; 

(T2) if R(G, 7r , v) is not discrete, then T(G, 7r , v) is a non-singleton cell of R(G, n , v); 
(T3) for any g e S n , we have T{G 9 ,ir 9 ,v 9 ) = T{G,ir,v) 9 . 

Now we can define the search tree T{G, ttq) depending on an initially-specified coloured 
graph (G,7r ). The nodes of the tree are elements of V*. 

(a) The root of T(G, ttq) is the empty sequence ( ). 

(b) If v is a node of T(G, no), let W = T(G, 7r , v). Then the children of 7r are 

{v || w : w e W}. 

This definition implies by (T2) that a node v of T(G, 7r ) is a leaf iff R(G,iro,v) is 
discrete. 

For any node v of T(G, 7r ), define T(G, 7r , f ) to be the subtree of T(G, 7To) consisting 
of v and all its descendants. The following lemmas are easily derived using induction from 
the definition of the search tree and the properties of the functions R, T and I. 

Lemma 1. For any G € Q, tt € II, g G S n , we have T(G 9 ,n^) = T{G,itq) 9 . 

Proof. Let v = [y\, . . . , v k ) be a node of T(G, 7r ). It is easily proved by induction on s 
that \y% is a node of T(G 9 , 7Tq) for < s < k. Therefore, T(G,ir ) 9 C T(G 9 ,7r^). The 
reverse inclusion follows on considering g -1 instead, so the lemma is proved. □ 

Corollary 2. Let v be a node ofT(G,n ) and let g £ Aut(G, n ). Then v 9 is a node of 
T(G,7T ) and T(G,w ,v 9 ) = T(G,ir Q ,v) 9 . 

Proof This follows from Lemma 1 on noticing that (G, iro) 9 = (G, tt) if g £ Aut(G, no). □ 

Lemma 3. Let v be a node of T(G,n ) and let n = R(G,n ,i / ). Then Aut(G, n) is the 
point-wise stabilizer of v in Aut(G,7r ). 

Proof. By condition (R2), every element of Aut(G, n) stabilizes v. Conversely, suppose 
g £ Aut(G, no) stabilizes v. Then by (R3), n 9 — R{G,n ,v) 9 = R(G, no,v) = n, so 
g £ Aut(G,7r). □ 
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2.4- Automorphisms and canonical forms 

Now we describe how the search tree T(G, no), defined as in the previous subsection, 
can be used to compute Aut(G,7To) and a canonical form. 

Let fl be some totally ordered set. A node invariant is a function 

4> : Q x 77 x V* -> Q, 

such that for any n e 77, G e Q, and distinct v, v' e T(G, 7r ), 

(^1) if \v\ = \v'\ and <j>(G, n 0l v) < 0(G, 7To, f'), then for every leaf i^i e T{G,n ,v) and 

leaf i/( e T(G,n ,v') we have 0(G, 7i"o, vi) < <f>(G, no,v[); 
(</>2) if 7r = 7?(G, 7r , v) and 7r' = 7?(G, n , i/) are discrete, then <j>(G, 7r , f) = 0(G, 7r , i/') 

G 71 " = G v (note that the last relation is equality, not isomorphism); 
(<^3) for any g e 5 n , we have <j)(G 9 ,-Kq,v 9 ) = (f>(G, no,v). 

Say that leaves v, v' are equivalent if 0(G, 7To, i>) = <^(G, n , i/). If this is the case, there 
is a unique g G Aut(G, ttq) such that v 9 = is', namely g — 7?(G, no, v')R{G, ttq, . 
(Recall that 7?(G, n , v) is a permutation if v is a leaf.) 

According to Corollary 2, if v is a leaf of T(G, 7r ), then so is v 9 for any g e Aut(G, no). 
Moreover, by the properties of <j> these leaves (over g G Aut(G, 7To)) have the same value 
of <j> and no other leaf has that value. Consequently, for any leaf v, 

Aut(G, tto) = {7?(G, ttq, i/)ii(G, tto, v)- 1 

: v' is a leaf of T(G, 7r ) with <f>(G, n , v') = </>(G, n , v)}. 

To define a canonical form, let 

(j>*(G, 7r ) = max{<fi(G, 7r , v) : v is a leaf of T(G, 7r )}, 

and let ^* be any leaf of T(G,n ) that achieves the maximum. Now define G(G, 7r ) = 
(G, 7r ) fl ( G, ' rro ' ly *- ) . By the properties of <j), C(G,n ) thus defined is independent of the 
choice of v* . In particular, we have: 

Lemma 4. The function 

C :Q x 77 -S-£ x 77 
as just defined is a canonical form. 

These observations provide an algorithm for computing Aut(G, n ) and G(G, 7r ), once 
we have defined T and (p. In practice it is not of much use, since the search tree can 
be extremely large and the group is found element by element rather than as a set 
of generators. However, in practice we can dramatically improve the performance by 
judicious pruning of the tree. 

When we refer to a leaf of T(G, no), we always mean a node v of T(G, n ) for which 
7?(G, 7r , v) is discrete, even if our pruning of the tree results in additional nodes having 
no children. 

We define three types of pruning operation on the search tree. 

(A) Suppose v,v' are distinct nodes of T(G,n ) with \v\ = \u\' and <f>(G, n , v) > 
4>(G,n ,v'). Operation Pa{v,v') is to remove T{G,no,v'). 

(B) Suppose v,v' are distinct nodes of T(G,n ) with \u\ = \v\' and 0(G,7r o ,^) 7^ 
4>{G, 7r , u'). Operation Pb(v,v') is to remove T(G, no, v 1 )- 

(C) Suppose g <E Aut(G, n ) and suppose v < v' are nodes of T(G,n ) such that 
v 9 = v' . Operation Pciy, 9) is to remove T(G, 7r , v'). 
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Theorem 5. Consider any G G Q and ttq € 77. 

(a,) Suppose any sequence of operations of the form Pa{v, v') or Pc{v, g) are performed. 

Then there remains at least one leaf x/\ with <p(G, ttq,vi) = 4>*(G, ttq). 
(7>J Lef Vo be some fixed leaf ofT(G, 7To). Suppose any sequence of operations of the 

form Pb{v,v') or Pc{v" ,g) are performed, where 4>{G,ttq,v") ^ 4>(G,ttq, [vo]\ v »\). 

Let gi , . . . , gk be the automorphisms used in the operations Pq that were performed, 

and let 

A = {g € Aut(G, 7r ) : Vq is a remaining leaf}. 
Then Aut(G, ttq) is generated by {gi, . . . , gk} U A. 

Proof. To prove claim (a), note that the lexicographically least leaf v\ with (f>(G, ttq,vi) — 
0*(G,7To) is never removed. 

For claim (b), note that the lexicographically least leaf v\ equivalent to vq is not 
removed by the allowed operations. Choose an arbitrary g G Aut(G,7r ). By Corollary 2, 
Vq is a leaf of T(G, tto). If it has been removed, that must have been by some Pc(v", gi) 
with v" < v 9 , since operation Pb{v,v') only removes leaves inequivalent to vq. Note 

that v^ 9i is a leaf descended from v" and v^ 9i < v^. If v^ 9i , has been removed, 

-i 

that must have been due to some Pc(y"' ,gj) with v'" < v n 1 , so consider the leaf 

v 9 q % 9j < v^ 9i . Continuing in this way we must eventually find a leaf that has not 
been removed, since the leaf v\ is still present. That is, there is some h € (<?i,-- • ,3fc) 
such that leaf v^ 1 has not been removed. This proves g belongs to the group generated 
by {gi, . . . , gk} U A, as we wished to prove. □ 

The theorem leaves unspecified where the automorphisms for Pc(v, g) operations come 
from. They might be provided in advance, detected by noticing two leaves are equivalent, 
or otherwise. This is discussed in the following section. 



3. Implementation strategies 

In this section, we describe two implementations of the generic algorithm, which are 



distributed together as nauty and Traces (McKay and Piperno 2012a | 



3.1. Refinement 

Let G G Q. A colouring of G is called equitable if any two vertices of the same colour 
are adjacent to the same number of vertices of each colour p~| 

It is well known that for every colouring tt there is a coarsest equitable colouring ir' 
such that tt' -< tt, and that tt' is unique up to the order of its cells. An algorithm for 



computing tt' appears in McKay (19801. We summarize it in Algorithm 1 



Let T^G, tt, a) be the function defined by Algorithm 1, which we assume to be imple- 
mented in a label-invariant manner. Now define the function 

7 : 77 x V 77, 



2 Unfortunately, "equitable colouring" also has another meaning in graph theory. More commonly, our 
concept is called an equitable partition. 
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Data: 7r is the input colouring and a is a sequence of some cells of 7r 
Result: the final value of tt is the output colouring 

while a is not empty and tt is not discrete do 
Remove some element W from a. 
for each cell X of tt do 

Let X\ , . . . , Xk be the fragments of X distinguished according 

to the number of edges from each vertex to W. 
Replace X by X\, . . . , Xf. in tt. 
if X E a then 

Replace X by X%, . . . , Xf. in a. 
else 

Add all but one of the largest of X\, . . . , Xk to a. 
end 
end 
end 

Algorithm 1: Refinement algorithm F(G,ir,a) 




Fig. 1. Example of an equitable colouring 
such that, if v is a vertex in a non-singleton cell of tt and n' — v), then for w G V, 

tt(w), if tc(w) < it(v) or w = v; 



n'(w) — 



ir(w) + 1, otherwise. 



We see that I(ir,v) differs from 7r in that a unique colour has been given to vertex v. 
Now we can define a refinement function. For a sequence of vertices v%,V2, ■ ■ ■ , define 

i?(G, 7To, ( )) = F(G, 7To, a list of all the cells of tto), 
R(G, tto, = F(G, I{R{G, tto, ( ({« x })), 
i?(G, tto, («i, «a)) - F(G, I{R{G, 7r , M), u 2 ), ({« 2 }))) , 
i?(G, tto, K «a, ws)) = F(G, I{R{G, tto, (v u v 2 )),v 3 ), ({v 3 }))) , 



and so on. According to Theorem 2.7 and Lemma 2.8 of McKay ( 1980 ), R satisfies (Rl)- 
(R3) and, moreover, R(G,tto,u) is equitable. 

In practice most of the execution time of the whole algorithm is devoted to refining 
colourings, so the implementation is critical. Since the splitting of X into fragments can 
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be coded more efficiently if W is a singleton, we have found it advantageous to choose 
singletons out of a in preference to larger cells. 




Fig. 2. Individualization of vertex 1 and subsequent refinement 



While the function R defined above is sufficient for many graphs, there are difficult 
classes (see Section 4) for which it does not adequately separate inequivalent vertices. 
Regular graphs are the simplest example, since the colouring with only one colour is 
equitable. A simple way of doing better is to count the number of triangles incident to 
each vertex. In choosing such a strategy, there is a trade-off between the partitioning 
power and the cost, nauty provides a small library of stronger partitioning functions, 
some of them designed for particular classes of difficult graphs. The improvement in 
performance can be very dramatic. On the other hand, choice of which partitioning 
function to employ is left to the user and requires skill, which is not very satisfactory. 

Traces has a different approach to this problem, as we will see in Section 3.3. 

3.2. Target cell selection 

The choice of target cell has a significant effect on the shape of the search tree, and 
thus on performance. A small target cell may perhaps have a greater chance of being an 
orbit of the group which fixes the current stabilizer sequence. For this reason, McKay| 



( 1980 ) recommended using the first smallest non-singleton cell. However, Kocay ( 1996 ) 



found (without realizing it) that using the first non-singleton cell regardless of size was 



better for most test cases, as confirmed by Kirk (19851. The current version of nauty 



has two strategies. One is to use the first non-singleton cell, and the other is to choose 
the first cell which is joined in a non-trivial fashion to the largest number of cells, where 
a non-trivial join between two cells means that there is more than edges and less than 
the maximum possible. 

Traces, on the other hand prefers large target cells, as they tend to make the tree less 
deep. A strategy developed by experiment is to use the first largest non-singleton cell 
that is a subset of the target cell in the parent node. If there are no such non-singleton 
cells, the target cell in the grandparent node is used, and so on, with the first largest cell 
altogether being the last possibility. 

3.3. Node invariants 

Information useful for computing node invariants can come from two related sources. 
At each node v there is a colouring R(G, n , v) and we can use properties of this colouring 
such as the number and size of the cells, as well as combinatorial properties of the coloured 



9 



graph. Another source is the intermediate states of the computation of a colouring from 
that of the parent node, such as the order, position and size of the cells produced by 
the refinement procedure and various counts of edges that are determined during the 
computation. 

If f(v) is some function of this information, computed during the computation of 
R(G, 7r , v) and from the resulting coloured graph, the vector (/(Ho), /(Hi), ■ • ■ , /(^)), 
with lexicographic ordering, satisfies Conditions (<^1) and (4>3) for a node invariant. If v 
is a leaf, we can append G" , where 7r is the discrete colouring R(G 1 ttq,i') 1 to the vector 
so as to satisfy (<fi2) as well. 

In nauty, the value of f(v) is an integer, and the pruning rules are applied as each 
node is computed. Traces introduced a major improvement, defining each f(v) as a 
vector itself. The primary components of f(v) are the sizes and positions of the cells in 
the order that they are created by the refinement procedure. (f>(G,TTQ,u) thus becomes 
a vector of vectors, called the trace (and hence the name "Traces"). The advantage is 
that it often enables the comparison of f(v) and f(y') to be made while the computation 



of v' is only partly complete. A limited form of this idea appeared in Bliss ( Juntilla 



and Kaski 2007), and also appears in a recent version of saucy (Darga et al. 20081. 
For many difficult graph families, only a fraction of all refinement operations need to be 
completed. A practical consequence is that the stronger refinements used by nauty (see 
Section 3.1) are rarely needed. This makes good performance in Traces less dependent 
on user expertise than is the case with nauty. 

If 7r is an equitable colouring of a graph G, we can define a the quotient graph Q(G, ir) 
as follows. The vertices of Q(G,n) are the cells of %, labelled with the cell number and 
size. For any two cells C\,Ci € 7T, possibly equal, the corresponding vertices of Q(G,n) 
are joined by an edge labelled with the number of edges of G between C\ and C%. 

The node invariant <p(G, ttq, v) computed by Traces, and also by nauty if the standard 
refinement process Algorithm 1 is used, is a deterministic function of the sequence of 
quotient graphs Q(G, R(G, 7r , Hi)) for i = 0, . . . , \v\. We could in fact use that sequence 
of quotient graphs, but that would be expensive in both time and space. Our experience 
is that the information we do use, which is essentially information about the quotient 
matrices collected during the refinement process, rarely has less pruning power than the 
quotient matrices themselves would have. 

3.4- Strategies for tree generation 

Now we have described the search tree T(g, 7Tq) as defined by nauty and Traces. 
In general only a fraction of the search tree is actually generated, since the pruning 
rules of Section 2.4 are applied. These pruning rules utilise both node invariants, as 
described in Section 3.3, and automorphisms, which are mainly discovered by noticing 
that two discrete colourings give the same coloured graph. Now we will describe order of 
generation of the tree, which is fundamentally different for nauty and Traces. 

In nauty, the tree is generated in depth-first order. The lexicographically least leaf v\ 
is kept. If the canonical labelling is sought (rather than just the automorphism group), 
the leaf v* with the greatest invariant discovered so far is also kept. A non-leaf node v is 
pruned if neither 4>(G, ttq, v) — 4>(G, ttq, H]|h) or 4>(G, 1"0) v) > <p(G, ttq, [v*]\„\). Such op- 
erations have both type Pa{v* , v) and Ps(fi, v), so Theorem 5 applies. Automorphisms 
are found by discovering leaves equivalent to v\ or v* , and also to a limited extent from 
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[1|35|7|02|68|4] 




[1|3|5|7|0|2|6|8|4] 



Fig. 3. Example of a search tree for the graph of Fig. 1 



the properties of equitable colourings. Pruning operation Pq is performed wherever pos- 
sible, as high in the tree as possible (i.e., at the children of the nearest common ancestor 
of the two leaves found to be equivalent). 

Until a recent version of nauty, the only automorphisms used for pruning operation 
Pc were those directly discovered, without any attempt to compose them. Now we use the 



random Schreier method (Seress 2003) to perform more complete pruning. By Lemma 3, 
nodes v || v\ and v || v-i are equivalent if v\,vi belong to the same orbit of the point- 
wise stabiliser of v in _T, where T is the group generated by the automorphisms found 
so far. This stabiliser could be computed with a deterministic algorithm as proposed 



by Kocay (1996) and Butler and Lam (1985), but we have found the random Schreier 



method (Seress. 2003[ ) to be more efficient and it doesn't matter if occasionally (due to 
its probabilistic nature) it computes smaller orbits. The usefulness of this for nauty's 



efficiency with some classes of difficult graph was demonstrated in 1985 by Kirk ( 1985 ) 
but only made it into the distributed edition of nauty in 2011. 

Nauty's basic depth-first approach is also followed by Bliss and saucy. However, 
Traces introduces an entirely different order of generating the tree. Some variations are 
possible but we will first describe the normative method, which is based on a breadth-first 
search. Define level k to be the set of nodes v with \v\ — k. In the A:-th phase, Traces 
computes those nodes v in level k which have the greatest value of 4>(G,itq,v) on that 
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[3|1 |7|5|0|6|2|8|4] 
gj= (0 6 8 2)(1 3 7 5) 

[1 |5|3|7|Z|0|8|6|4] 



Fig. 4. Search tree order for nauty (left) and Traces (right) 

level. By property (01), such nodes are the children of the nodes with greatest <j> on the 
previous level, so no backtracking is needed. This order of tree generation has the big 
advantage that pruning operation Pa is used to the maximum possible extent. 

As mentioned in Section 3.3, the node invariant 4>(G, ttq, v) is computed incrementally 
during the refinement process, so that pruning operation Pa can often be applied when 
the refinement is only partly complete. 

An apparent disadvantage of breadth-first order is that pruning by automorphisms 
(operation Pc) is only possible when automorphisms are known, which in general re- 
quires leaves of the tree. To remedy this problem, for every node a single path, called 
an "experimental path" , is generated from that node down to a leaf of the tree. Auto- 
morphisms are found by comparing the labelled graphs that correspond to those leaves, 
with the value of <f>(G, ttq, v) at the leaf being used to avoid most unnecessary compar- 
isons. We have found experimentally that generating experimental paths randomly tends 
to find automorphisms that generate larger subgroups, so that the group requires fewer 
generators altogether and more of the group is available early for pruning. 

The group generated by the automorphisms found so far is maintained using the 
random Schreier method. Some features of the Schreier method are turned on and off in 
Traces when it is possible to heuristically infer their computational weight. 

Figure 4 continues the example of Figure 3, showing the portion of the search tree tra- 
versed by nauty (left) and Traces (right). Node labels indicate the order in which nodes 
are visited, and edge labels indicate which vertex is individualized. During its backtrack 
search, nauty stores the first leaf (2) for comparison with subsequent leaves. Leaves 2 
and 3 provide the generator <?" = (0 2) (3 5) (6 8), which for example allows pruning of the 
greyed subtree formed by individualizing vertex 5 at the root. Traces executes a breadth- 
first search, storing with each visited node the discrete partition obtained by a randomly 
chosen experimental path (shown by green arrow) . After processing node 2 of the tree, the 
experimental leaves 1 and 2 are compared, revealing the generator gf = (0 6 8 2) (1 3 75), 
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which allows for pruning the greyed subtrees formed by individualizing vertices 5 and 7 
at the root. 



3.5. Detection of automorphisms 



The primary way that automorphisms are detected, in all the programs under consid- 
eration, is to compare the labelled graphs corresponding to leaves of the search tree as 
described above. 



An important innovation of saucy (Darga et al. 2008) was to detect some types of 



automorphism higher in the tree. Suppose that ir, w are equitable colourings with the 
same number of vertices of each colour. Any automorphism of (G, ttq) that takes tt onto 
tt' has known action on the fixed vertices of tt: it maps them to the fixed vertices of tt' 
with the same colours. In some cases that saucy can detect very quickly, this partial 
mapping is an automorphism when extended as the identity on the non-fixed vertices. 
This happens, for example, when a component of G is completely fixed by two different 
but equivalent stabilization sequences. This is one of the main reasons saucy can be very 
fast on graphs with many automorphisms that move few vertices. 




Level L 





■ best leaf 



node -\ 



o 



Fig. 5. Traces search strategies for canonical labelling or automorphism group 

Traces extends this idea by finding many automorphisms that do not require the 
identity mapping on the non-trivial vertices. It does this by a heuristic that extends the 
mapping from the fixed vertices to the non- fixed vertices, which is applied in certain 
situations where it is more likely to succeed. 

When Traces is only looking for the automorphism group, and not for a canonical 
labelling, it employs another strategy which is sometimes much faster. Suppose that while 
generating the nodes on some level L, it notices (during experimental path generation) 
that one of them, say has a child which is discrete. At this point, Traces determines 
and keeps all the discrete children of v (modulo the usual automorphism pruning). Now, 
for all nodes v' on level L, a single discrete child v" is found, if any, and an automorphism 
is discovered if it is equivalent to any child of v. The validity of this approach follows 
from Theorem 5 with the role of v§ played by the first discrete child of v. 
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Figure 5 (left) shows the whole tree up to level L+l, where a node labelled by X 
represents a discrete partition corresponding to labelled graph X, while an unlabelled 
(and smaller) node stands for a non-discrete partition. Figure 5 (center) shows the part 
of the tree which is traversed by Traces during the search for a canonical labelling. Only 
the best leaf is kept for comparison with subsequent discrete partitions. 

Figure 5 (right) shows the part of the tree which is traversed by Traces during an 
automorphism group computation. All the discrete children of v are kept for comparison 
with subsequent discrete partitions. When the first discrete partition is found as a child 
of a node v' at level L, either it has the same labelled graph as one of those stored, or the 
whole subtree rooted at v' has no leaf with one of the stored graphs. In the first case, an 
automorphism is found. In both cases, the computation is resumed from the next node 
at level L. 

3.6. Low degree vertices 

Graphs in some applications, such as constraint satisfaction problems described by 



Darga et al. (2004) have many small components with vertices of low degree, vertices 
with common neighborhoods, and so on. Saucy handles them efficiently by a refinement 
procedure tuned to this situation plus early detection of sparse automorphisms. Traces 
employs another method. Recall that after the first refinement vertices with equal colours 
also have equal degrees. The target cell selector never selects cells containing vertices of 
degree 0, 1, 2 or n—1, and nodes whose non-trivial cells are only of those degrees are not 
expanded further. Special-purpose code then produces generators for the automorphism 
group fixed by the node and, if necessary, a unique discrete colouring that refines the 
node. 

This technique is quite successful. However, in our opinion, graphs of this type ought to 
be handled by preprocessing. For example, sets of vertices with the same neighborhoods 
ought to be replaced by single vertices with a colour that encodes the multiplicity. All 
tree-like appendages, long paths of degree 2 vertices, and similar easy subgraphs, could 
be efficiently factored out in this manner. 



4. Performance 

In the following figures, we present some comparisons between programs for a variety 
of graphs ranging from very easy to very difficult. We made an effort to include graphs 
that are easy and difficult for each of the programs tested. 

Most of the graphs are taken from the Bliss collection, but for the record we provide 



all of our test graphs at the nauty and Traces website (McKay and Piperno 2012a). 

The times given are for a Macbook Pro with 2.66 GHz Intel i7 processor, compiled 
using gec 4.7 and running in a single thread. Easy graphs were processed multiple times 
to give more precise times. In order to avoid non-typical behaviour due to the input 
labelling, all the graphs were randomly labelled before processing. In some classes, such as 
the "combinatorial graphs" , the processing time can depend a lot on the initial labelling; 
the plots show whatever happened in our tests. 

The following programs were included. Programs (c)-(e) reflect their distributed ver- 
sions at the end of October 2012. 
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(a) nauty version 2.5 

(b) Traces version 2.0 

(c) saucy version 3.0 

(d) Bliss version 7.2 

(c) conauto version 2.0.1 
The first column of plots in each figure is for computation of the automorphism group 
alone. The second column is for computation of a canonical labelling, which for all the 
programs here includes an automorphism group computation. 

For nauty we used the dense or sparse version consistently within each class, depending 
on whether the class is inherently dense or sparse. We did not use an invariant except 
where indicated, even though it would often help. 

Saucy does not have a canonical labelling option. Version 3.0, which was released 
just as this paper neared completion, has an amalgam of saucy and Bliss that can do 
canonical labelling, but we have not tested it much. 

Conauto features automorphism group computation and the ability for testing two 
graphs for isomorphism. We decided that the latter is outside the scope of this study. For 
the same reason we did not include the program of Foggia et al. ( 2001 ) in our comparisons. 

Another excellent program, that we were unfortunately unable to include for technical 
reasons, is due to Stoichev (2010). Many more experiments and comments can be found 
at http : //pallini . di . uniromal . it. 



5. Conclusions 

We have brought the published description of nauty up to date and introduced the 
program Traces. In particular, we have shown that the highly innovative tree scanning 
algorithm introduced by Traces can have a remarkable effect on the processing power. 
Although none of the programs tested have the best performance on all graph classes, it is 
clear that Traces is currently the leader on the majority of difficult graph classes tested, 
while nauty is still preferred for mass testing of small graphs. An exception is provided 
by some classes of graphs consisting of disjoint or minimally-overlapping components, 



here represented by non-disjoint unions of tripartite graphs. Conauto and Bliss ( Juntilla 



and Kaski 20111) have special code for such graphs, but as yet nauty and Traces do not. 



We wish to thank Gordon Royle for many useful test graphs. We also thank the authors 
of saucy, Bliss and conauto for many useful discussions. The second author is indebted 
to Riccardo Silvestri for his strong encouragement and valuable suggestions. 
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Random cubic graphs (nauty invariant distances (2)) 





2,000 4,000 6,000 8,000 10,000 



2,000 4,000 6,000 8,000 10,000 



▲ Bliss o saucy ; conauto • nauty • nauty with invariant □ Traces 
Fig. 6. Performance comparison (horizontal: number of vertices; vertical: time in seconds) 
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Automorphism group 



Hypercubes (vertex-transitive) 
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Miscellaneous vertex-transitive graphs 
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(Non-disjoint) union of tripartite graphs 




Canonical label 
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Fig. 7. Performance comparison (horizontal: number of vertices; vertical: time in seconds) 
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Automorphism group 
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Fig. 8. Performance comparison (horizontal: number of vertices; vertical: time in seconds) 
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Automorphism group 



Canonical label 
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Fig. 9. Performance comparison (horizontal: number of vertices; vertical: time in seconds) 
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Automorphisms groups of projective planes of order 16 
(regular bipartite graphs of order 546 and degree 17) 
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Automorphisms of some combinatorial graphs 
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Canonical labelling of the above graphs 
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Fig. 10. Performance comparison (horizontal: graph number; vertical: time in seconds) 
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